# Image-Text Interaction
Gemma 3 4B It Qat GGUF
The Gemma 3 4B IT model by Google supports multimodal input and long-context processing, suitable for text generation and image understanding tasks.
Image-to-Text
G
lmstudio-community
46.55k
10
Gemma 3 27b It Int4 Gguf
Gemma 3 is a lightweight cutting-edge open model family from Google, built on the same research technology as Gemini models. Supports text/image input and text output, offering both pretrained and instruction-tuned weight versions.
Image-to-Text
G
gaunernst
232
3
Blip Vqa Base
Bsd-3-clause
BLIP is a unified vision-language pretraining framework, excelling in visual question answering tasks through joint language-image training to achieve multimodal understanding and generation capabilities
Text-to-Image
Transformers

B
Salesforce
1.9M
154
Featured Recommended AI Models